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Abstract — In this paper, we present a Bayesian channel estima- 
tion algorithm for multicarrier receivers based on pilot symbol 
observations. The inherent sparse nature of wireless multipath 
channels is exploited by modeling the prior distribution of 
multipath components' gains with a hierarchical representation 
of the Bessel K probability density function; a highly efficient, 
fast iterative Bayesian inference method is then applied to 
the proposed model. The resulting estimator outperforms other 
state-of-the-art Bayesian and non-Bayesian estimators, either by 
yielding lower mean squared estimation error or by attaining the 
same accuracy with improved convergence rate, as shown in our 
numerical evaluation. 

I. Introduction 

The accuracy of channel estimation is a crucial factor de- 
termining the overall performance in wireless communication 
systems and networks, in terms of bit-error-rate (BER) and 
throughput but also of location accuracy when these systems 
are equipped with positioning capabilities. When the underly- 
ing structure of the channel responses to be estimated is sparse, 
compressive sensing and sparse signal representation can be 
very powerful tools for the design of channel estimators. 

Compressive sensing techniques have attracted considerable 
attention in recent years due to their ability to be incorporated 
in a wide range of applications. Typically, the signal model 
considered reads 



y 



<t>CY + w 



(1) 



where y G 



is the measurement vector and <I> 



[<f>i, . . . , 4> L ] £ C is the known dictionary matrix with 
L > M column vectors <p h I = 1, ...,L. The vector 
w e c Mxl represents the samples of additive white Gaussian 
noise with covariance matrix A -1 / and precision parameter 
A > 0. Finally, a = [a\, . . . , a_fJ T € C ixl is the vector of 
weights whose entries are mostly zero. By obtaining a sparse 
estimate of a. we can accurately represent 3>a: with a minimal 
number of column vectors in <fr. 

In the literature many Bayesian and non-Bayesian methods 
have been proposed for sparse signal representation. The latter 
methods include the very popular convex optimization based 



methods for LASSO regression Jl], |]2] and greedy construc- 
tive algorithms such as orthogonal matching pursuit (OMP) 
and compressive sampling MP (CoSaMP) [4]. In sparse 
Bayesian learning (SBL) [5|, [6|, a prior probability density 
function (pdf) p(a) is specified so that a sparse estimate a 
is obtained. A widely applied SBL algorithm is the relevance 
vector machine (RVM) [5|, where a hierarchical representa- 
tiorfl of the student-t pdf is used for the prior pdf p(ot). An 
EM algorithm is then derived based on this prior model for the 
estimation of the weights. Similarly, [7 1 uses the EM algorithm 
based on a hierarchical representation of the Laplace pdf @ This 
algorithm can be seen as the Bayesian version of the LASSO 
estimator. Though the sparse Bayesian inference algorithms 
proposed in [5 | and [7| are guaranteed to converge, they are 
also known to suffer from high computational complexity and 
low convergence rate - many iterations are needed before 
they terminate. To circumvent this, a fast Bayesian inference 
algorithm, known as Fast-RVM, is proposed in 1101 . Following 
this approach, the Fast-Laplace algorithm is formulated in [8|. 
However, even though the algorithms in iflOl and (8 j do lead 
to faster convergence than their EM counterparts in [5 | and 
0, they still suffer from slow convergence especially in low 
and moderate signal-to-noise ratio (SNR) regimes as we show 
in this paper. 

The estimation of the wireless channel is a practical example 
where compressive sensing techniques are utilized. The reason 
is that the response of the wireless channel typically holds 
a few dominant multipath components and therefore has the 
characteristic of being sparse ifPTl . When sparse channel 
models are assumed it seems natural to use tools available 
from compressive sensing and sparse signal representation 
to estimate the parameters of said channel models. LASSO 
regression, OMP, and CoSaMP have been widely applied to 
the problem of pilot-assisted channel estimation in orthogo- 
nal frequency-division multiplexing (OFDM), cf., Ifl2l -H4l. 
Bayesian methods have also been previously proposed for 
wireless communication systems. Examples include the esti- 
mation of the dominant multipath components in the response 
of wireless channels lfl"5l and joint channel estimation and 
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1 The hierarchical representation involves specifying a conditional prior pdf 
p(ct\f) and a hyperprior pdf p(-f). 

2 Note that the hierarchical representation of the Laplace pdf used in 1 7 1 and 
|8 | is only valid for real-valued variables. In [9 |, we extend this representation 
to cover complex-valued variables as well. 



decoding for clustered sparse channels |[T6l . In iflTl , we have 
proposed a variational Bayesian inference algorithm for the 
estimation of the wireless channel in OFDM. The resulting 
estimator, however, suffers from the same complexity and 
convergence rate issues as those in [5] and Q. 

In this paper, we present a fast iterative sparse Bayesian 
estimation algorithm for pilot-assisted channel estimation in 
OFDM wireless receivers. We follow the fast inference frame- 
work outlined in iflOl based on the hierarchical prior model 
of the Bessel K pdf for sparse estimation that we propose in 
|9|, |17|. Our estimator drastically increases the convergence 
speed compared to similar algorithms such as Fast-RVM and 
Fast-Laplace with no penalization in performance and achieves 
favorable BER and mean-squared error (MSE) performance as 
compared to both Bayesian and non-Bayesian state-of-the-art 
methods. 

II. System Description 

A. OFDM Signal Model 

We consider a single-input single-output OFDM system 
with N subcarriers. A cyclic prefix (CP) is added to eliminate 
inter-symbol interference between consecutive OFDM blocks 
and the channel response is assumed static during the trans- 
mission of each OFDM block. The received baseband signal 



r € 



for a given OFDM block reads 
r = Xh + n. 



(2) 



The diagonal matrix X = diag(iri, x%, ■ ■ ■ , Xn) contains the 
complex-modulated symbols. The entries in h e C N are the 
samples of the channel frequency response at all N subcar- 
riers. Finally, n £ C N is a zero-mean complex symmetric 
Gaussian random vector whose entries are independent with 
variance A -1 . 

Let the pilot pattern be characterized by the set V C 
{1, . . . , N} containing the indices of subcarriers reserved for 
pilot transmission. The received signals observed at the pilot 
positions rp = [r n : n G V] T are then divided each by their 
corresponding pilot symbol in Xp — diag(a; n : n g V) to 
produce the vector of observations 



y = {X-p) 1 r v = hp + (Xp) 1 np> 



(3) 



where hp and np are defined analogously to rp. We assume 
that a\\ M = \P\ < N pilot symbols hold unit power so that 
the statistics of the noise term (Xp)~ 1 np remain unchanged. 

We consider a frequency-selective, block-fading wireless 
channel with impulse response modeled as a sum of multipath 
components: 



K 



(4) 



fc=i 



In this expression, /3& and are respectively the complex 
weight and the (continuous) delay of the fcth multipath com- 
ponent, K is the total number of multipath components, and 
5(-) is the Dirac delta function. The channel parameters 
Tfc, and K are all random variables and may vary from the 



transmission of one OFDM block to the next. Additional 
details regarding the assumptions on the channel model are 
provided in Section [IV] 

By using the parametric model © of the channel, we can 
rewrite (0 as 



y = T(t)(3 + w 



(5) 



with hp = T(t)(3, w = {X v )- x n P , (3 = [ft, . . . , T , 
r = [n, . . . , t k ] , and T(r) € C MxK with entries 

to = 1 , 2 , . . . , M 

k = l,2,...,K 

where f m denotes the frequency of the mth pilot subcarrier. 



T{r) m ,k = exp (-j27r/ m T fe ) , 



(6) 



B. Compressive Sensing Signal Model 

In order to apply sparse representation methods for the 
estimation of h in d2j, we must first recast the signal model in 
(0 into the form of ([T). The main limitation to do so is that 
the delay entries in r are, a priori, unknown at the receiver. 
To circumvent this, we consider a grid of uniformly-spaced 
delay samples in the interval [0,r max ]: 

T s 2T S it 



Td 



0. 



C C 



■ ,r„ 



(7) 



with C > such that C,T max /T s is an integer. The symbols 
Tmax and T s denote respectively the maximum excess delay 
of the channel and the sampling time. The dictionary matrix 
* € C Mxi is now defined as # = T{r d ). Thus, the 
entries of $ are of the form (|6]l with argument Td- The 
number of columns L = C,T max /T s + 1 in $ is thereby 
inversely proportional to the selected delay resolution T s /(. 
The selection of Td impacts the dimension of a. By assuming 
a vector a with many more entries than the number of 
multipath components, we expect most of the entries in a 
to be zero. Therefore, we use compressive sensing techniques 
to obtain sparse estimates of ct. 

Notice that the signal model (Q]l with $ = T(Td) is an 
approximation of the true signal model (|5}- The estimate of 
the channel vector at the pilot subcarriers is then hp = $S. 
In order to estimate the full channel h in (0 the dictionary $ 
is appropriately expanded to include a row corresponding to 
each of the N subcarrier frequencies. Thus, h = <& full S: with 



full A 



exp(-j27r/ n r dl ) 



n = 1,2, . . . , N 
1 = 1,2,. ..,L 
where /„ denotes the frequency of the nth subcarrier. 



(8) 



III. Bayesian Inference Learning 

We now present the iterative sparse Bayesian inference 
algorithm for channel estimation proposed in this paper. First, 
we detail the hierarchical prior model leading to the Bessel K 
pdf for each entry of a. Based on this model, we apply a fast 
Bayesian algorithm to estimate the unknown model parame- 
ters. Finally, we briefly comment on the relationship between 
our algorithm and other similar state-of-the-art approaches. 



A. The Probabilistic Model 



equivalently computes (fTTT i- (fT2l > and the M-step computes 



Instead of working directly with the prior pdf p(a), in 
the SBL framework, p(ot) is usually modeled using a two- 
layer hierarchical prior model involving a conditional prior 
pdf p(a\-f) and a hyperprior pdf p(j). With this design, the 
resulting probabilistic model for signal model ([T) is given by 

P(y, 7, A) = p(y\a, X)p(X)p(a\'y)p('y) 

L 

= p(y\a,x)pWY[p( a ih)p(n)- (9) 

1=1 

Due to ([T), p(y\ct,X) is multivariate Gaussian: p(y\a,X) = 
CN(y\&a, A _1 7)|j For the noise precision A, we select a 
constant prior, i.e., p(X) oc 1. 

The design of the factors p(ai\ji) and p(ji) for each 
weight on heavily influences the sparsity-inducing property 
of the prior model. We adopt the hierarchical structure of the 
Bessel K pdf, where the first layer is defined as p{ai\ r yi) = 
CN(a;|0,7;) and the second layer is selected to be p{pfi) = 
Ga(7;|e, 77). With these choices, we compute the marginal pdf 



p(oti;e,ii) = 



2rj- 



7rT(e) 



<*i 



L ^i(2^|az| 



GO) 



In this expression, K v (-) is the modified Bessel function of 
the second kind and order v £ R. The parameter e determines 
the sparsity-inducing property of the Bessel K pdf [9|. The 
selection e = greatly enforces sparseness on the estimate 
as more probability mass concentrates around the origin. 
As a consequence, the mode of the resulting posterior pdf 
p(ot\y,e,r]) is more likely to be found close to the axes. 
However, selecting a too high e (e > 1) may lead to over- 
fitting and thereby non-sparse results. Thus, this parameter 
has a similar functionality as the parameter p in the FOCUSS 
algorithm lfl8l . 

B. Fast Iterative Bayesian Inference 

Given fixed estimates 7 and A, the posterior pdf 
p(a\y,'j, A) is a multivariate Gaussian, i.e., p(a\y,j, \) = 

CN(a|/i,£) with 



a* h * + r~ 



fi = AE* H y 



(11) 
(12) 



where T = diag(7i, . .. ,7i). The hyperparameters 7 and A 
are estimated by maximizing Q, 



£(7, A) = log(p(y|7, A)p(7)p(A)). 



(13) 



The cost function ( fT3l can be iteratively maximized using the 
EM algorithm by noting that a and y are complete data for 
7 and A. Following the classical EM formulation, the E-step 

3 Here, CN(-|a, B) denotes a complex Gaussian pdf with mean vector 
a and covariance matrix B. We shall also make use of Ga(-|a, b) = 
r(a) x<1 ~ 1 exp(-bx), which denotes a gamma pdf with shape parameter a 
and rate parameter b. 



. (e - 2) + y /(e- 2)2+477(1^) 
7; = - — — > « = !,...,£, (14) 



2r? 



A = 



M 



\y - *a||l) 



2\ • 



(15) 



The expectation (■) in the above expressions are evaluated 
with respect to the posterior pdf p(a.\y, 7, A), where 7 and A 
are the estimates computed in the previous iteration. After an 
initialization procedure, the individual quantities in (ITTT) — (TT~2b 
and (TT4l >— <fT~5b are iteratively updated until convergence. 

The above EM algorithm suffers from two main disad- 
vantages: high computational complexity of the update (fTTI) 
and low rate of convergence. In order to overcome the first 
drawback a greedy procedure as in iTTOl can be adopted: as 
most of the entries in a are mostly zero, one may start 
out with an "empty" dictionary matrix and incrementally fill 
the dictionary by adding column vectors. To circumvent the 
drawback of low convergence rate, we compute the stationary 
points of the EM update 7; in (TPfl i. For this, we fix %, 
k 7^ I at their current estimates, while computing a sequence of 
estimates {^fl^li according to (fT4l for T — > oo01n this way, 
we update the estimates of the components in {71, . . . ,Jn} 
sequentially, instead of jointly. The generalized EM framework 
justifies this modification. As shown in (9), 7!°°' corresponds 
in fact to the (local) extrema of 

£(-jl) =£(7;,7_ ( ,A) = -log|l+7;s;| 



1 9*1 



+ (e - l)log7z - Til + c (16) 

with c being a constant encompassing the terms independent 
of 7; and the definitions si = 4>i CZi4>i, qi — y U CZj4>h an< ^ 
C - A- 1 / + J2 kf} + 7/ 0,0f = C_ ; + 7/0/0F0 

Note that the definition domain of £(ji) is R + . Now, taking the 
derivative of with respect to 7/ and equating the result 
to zero yields the cubic equation 







+ 7i[r/+(3-2e) S; -M 2 ]-(e-l). 



(17) 



In general ( TlTb has three solutions when 7/ ranges through R. 
These can be determined analytically with a feasible solution 
for 7; constrained to be positive. The analysis of the sparsity- 
inducing property of the Bessel K pdf in [9| shows that we 
should select e small. When e < 1, ( TTTb has at least one 
negative solution as — (e — 1) > 0. Therefore, (fTTt has either 
no real positive solution or two real positive solutions 7,^ 
and 7. . In the former case, no feasible solution to £(71) 
exists and the corresponding column vector <p, is not added 

(i) 

to the dictionary. In the latter case, we simply select 7 Z if 
£(7, (i) ) > £{i\ U) ) and 7; ( " } otherwise. 

We follow the approach in IfTOl and realize the proposed 

4 Notice that ( | ct ; | 2 ) in 1141 is a function of 7; as seen from i ll II and 1121 . 
5 For the derivation of l{"fi), we exploit that p(j/|7,A) is Gaussian with 



mean zero and covariance matrix C = A 



-1 i 



n 




fast iterative Bayesian inference algorithm by computing each 
7;, I = 1, . . . , L, and selecting the one 7 that gives rise to the 
greatest increase in £(^1) between two consecutive iterations. 
Depending on the new value ji, we may then add, delete, or 
keep the corresponding column vector 4> l in the dictionary. The 
quantities S, (1, and A are updated using ill It . dl2l . and (fT5T l 
together with the computation of s; and qi, I = 1, . . . , L. The 
computational complexity of each iteration is O(LMK) when 
K < M < L, where K is the number of nonzero components 
in fi. If A is not updated between two consecutive iterations, 
X, (1, 81, and qi can be updated efficiently according to the 
update procedures in ifTOl . In this case the cost in complexity 
is only O(LM). We refer to the proposed algorithm as Fast- 
BesselK. 

C. Fast-RVM and Fast-Laplace 

The Fast-BesselK algorithm described in Section IIII-BI is 
parametrized by e and r). In the following, we will show how, 
by appropriately setting these parameters, we can obtain Fast- 
RVM iflOl and Fast-Laplace [8| as particular instances of Fast- 
BesselK. For Fast-RVM, the estimation of 72 relies on the 
maximization of the likelihood p(y\"fi, 7_;, A), i.e., a constant 
prior is assumed for the hyperprior, ^(7;) oc 1. Hence, by 
selecting e = 1 and r\ = we obtain the cost function £(71) 
used in IfTOl . In case of Fast-Laplace J8), the exponential 
pdf is selected for p(ji). As the gamma pdf reduces to the 
exponential pdf by choosing its shape parameter e = 1, we 
obtain ^(7;) used in [8| from this choice. 

IV. Numerical Results 

We perform Monte Carlo simulations to evaluate the per- 
formance of Fast-BesselK derived in Section [TTTJ We consider 
a scenario inspired by the 3GPP LTE standard [20] with the 
settings specified in Table Q] In all investigations conducted 
we fix the spectral efficiency of n = Md(N — M)R/N = 
0.92 information bits per subcarrier, which corresponds to 
a rate R = 1/2 code. We note that we employ a rate- 1/3 
convolutional code and use puncturing in order to increase 
the spectral efficiency. Unless otherwise specified, M = 100 
evenly-spaced pilot symbols are used. 



TABLE I 

Parameter settings for the simulations. 



Sampling time, T s 


32.55 ns 


CP length 


4.69 fj,s / 144 T s 


Subcarrier spacing 


15 kHz 


Pilot pattern 


Evenly spaced, QPSK 


Modulation 


QPSK (M d = 2) 


Subcarriers, N 


1200 


OFDM symbols 


1 


Information bits 


1091 


Channel interleaver 


Random 


Convolutional code 


(133,171,165) 8 


Decoder 


BCJR algorithm ED 



The multipath channel (0]i is based on the model used 
in ETI where, for each realization of the channel, the total 
number of multipath components K is Poisson distributed 
with mean (K) = 10 and the delays 7%, k = 1, . . . , K, are 
independent and uniformly distributed random variables drawn 
from the continuous interval [0, 144 T s ], Conditioned on Tfe, 
k = 1, . . . , K, the weights /3fe, k = 1, . . . , K, are independent, 
and weight /3& has a zero-mean complex circular symmetric 
Gaussian distribution with variance a 2 (rk) = uexp(— Tk/v) 
and parameters u,v > 0@ In this way {rfc, j3k] form a marked 
Poisson process. 

For Fast-BesselK, we set e = 0.5 and rj = 1 in all 
investigations. We empirically observed that this is a proper 
selection of parameters for channel models with both few and 
numerous multipath components. Fast-BesselK is compared to 
two Bayesian methods, Fast-RVM liTOlPl and Fast-Laplace |8@. 
For these three algorithms the noise precision A is estimated 
at every third iteration with the initialization Var(y)/100 1 10 1 . 
The stopping criterion is based on the difference in £(ji) 
between two consecutive iterations ll22l . Two non-Bayesian 
methods, LASSO and OMR are also included for comparison. 
For LASSO, we use the sparse reconstruction by separable 

6 The parameter u is computed such that (X]fcLi IA| 2 ) = 1. In the 
considered simulation scenario, (K) = 10, r raax = 144 T s , and v = 40 T s . 
7 The software is available at http://people.ee.duke.edu/~lcarin/BCS.html 
8 The software is available at http://ivpl.eecs.northwestem.edu/ 




approximation (SpaRSA) algorithm [23 0. The required reg- 
ularization parameter is chosen as 5^/log(L)/A lf24l . which 
has been empirically observed to provide satisfactory results. 
For OMP, an a priori estimate of the sparsity of a. needs to 
be set. In all investigations we use (K) + 10. Finally, the 
commonly employed robustly designed Wiener filter (RWF) 
ll25l for OFDM channel estimation is used as a reference. 

Unless otherwise specified, we set the number of rows in 
<& to M = 100 (pilot subcarriers) and the number of columns 
in 4> to L = 200, which corresponds to a delay resolution of 
T s /C = 0.72 T s . The performance versus SNR is compared in 
Figs. |l(a)|l(b)| From Fig. |l(a)[ we see that Fast-BesselK and 
Fast-Laplace outperform the other algorithms in terms of BER 
across all the SNR range considered. Specifically, at 1 % BER 
the gain is apporiximatly 1 dB over Fast-RVM, LASSO, and 
OMP and 2 dB over RWF. Fig. [T(b)l shows how Fast-BesselK 
yields a lower MSE than the other algorithms. Surprisingly, 
the improved performance in MSE achieved by Fast-BesselK 
does not lead to a better BER performance when compared to 
Fast-Laplace. 

The convergence speed of the Bayesian iterative algorithms 
is shown in Fig. |l(c)| Here, Fast-BesselK achieves a remark- 
able improvement compared to Fast-RVM and Fast-Laplace 
with MSE values converging in about 10-30 iterations. As 
Fig. |l(c)| shows, there is no guarantee that the MSE is reduced 
at each iteration, due to the objective function (fT3l . Fast-RVM 
and Fast-Laplace suffer a significant increase in MSE after 
a certain number of iterations; this drawback is significantly 
mitigated in the case of Fast-BesselK. The superior conver- 
gence speed of Fast-BesselK can be explained by observing 
Figs. |2(a)|2(b)| Fig. |2(b)| shows that the improvement in 
convergence rate comes as the Besssel K prior can handle 
channels with few multipath components better (i.e., yields 
lower MSE). As a consequence, the other methods tend to add 
more column vectors to the dictionary matrix, thus, increasing 
the number of add, delete, and reestimate iterations as seen 
from Fig. 2(a)| 

Fig. |2(c) shows the MSE versus the number of pilots M. 
We observe that, for a given MSE performance, Fast-BesselK 
is able to significantly reduce the required pilot overhead. 
In particular, Fast-BesselK achieves an MSE on pair with 

'The software is available on-line at http://www.lx.it.pt/~mtf/SpaRSA/ 



LASSO, OMP, and RWF using less than half the number of 
pilots. Finally, in Fig. |2(d)| we evaluate the MSE performance 
versus available delay resolution determined by the number 
of columns L in <fr (cf., Section Several observations 

are worth being noticed. Fast-BesselK leads to a noticeable 
MSE performance gain as the delay resolution improves as 
opposed to the other algorithms. In fact, it appears that, besides 
Fast-BesselK, only OMP is able to exploit the improved delay 
resolution. The reason for this is that LASSO, Fast-RVM, and 
Fast-Laplace produce a solution h-p = $3 with an increasing 
number of nonzero components K in a when increasing L 
(there are simply more column vectors in to be added or 
deleted). Thus, these algorithms also require an increasing 
amount of iterations to be run as opposed to Fast-BesselK 
(results not shown). 

V. Conclusion 

In this work, we presented a fast iterative Bayesian infer- 
ence channel estimation algorithm based on the hierarchical 
Bayesian prior model of the Bessel K probability density 
function. Following the framework for fast Bayesian inference 
in flOl . we proposed an algorithm that significantly lowers 
the number of needed iterations as compared to state-of- 
the-art Bayesian inference methods with no penalization in 
performance. This improvement in convergence rate is directly 
related to the Bessel K prior's ability to handle channels 
with few multipath components better than other commonly 
employed prior models. Furthermore, our algorithm shows 
improved performance when compared to both Bayesian and 
non-Bayesian state-of-the-art methods. 
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